fix: mask signal during CHAIN_AT_START to survive chained handler re-raises by jpnurmi · Pull Request #1572 · getsentry/sentry-native

jpnurmi · 2026-03-11T12:33:42Z

Summary

Mask the signal before invoking the chained handler so SA_NODEFER doesn't let re-raises kill the process
Use raw rt_sigprocmask syscall to bypass Android's libsigchain
After the chain: reinstall our handler if it was reset to SIG_DFL, consume any pending signal via sigtimedwait, and unmask

Context

With SA_NODEFER, the chained handler (e.g. Mono) can reset the signal handler to SIG_DFL and re-raise. The re-raised signal is delivered immediately and kills the process before inproc can capture the crash.

On Android, sigprocmask is intercepted by libsigchain which silently drops crash signals, so a raw syscall is needed.

SA_NODEFER (added in #1446) is incompatible with the CHAIN_AT_START signal handler strategy. When chaining to the runtime's signal handler (e.g. Mono), the runtime may reset the signal to SIG_DFL and re-raise. With SA_NODEFER the re-raised signal is delivered immediately, killing the process before our handler can regain control. Without SA_NODEFER, the re-raised signal is blocked during handler execution, allowing the runtime handler to return and sentry-native to proceed with crash capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

supervacuus · 2026-03-11T13:06:12Z

Mono's mono_handle_native_crash resets the crashing signal to SIG_DFL and re-raises it as part of its crash handling flow. With SA_NODEFER, the re-raised signal is delivered immediately

Yeah, this makes sense, but is there no Mono test in the downstream integration tests? It is quite painful that the two runtimes differ so severely, given that we seem to lack any early warning for that particular config. Or did this only happen with .NET 10?

Recursive crash detection (the reason SA_NODEFER was added) is not critical for CHAIN_AT_START because:

Not critical is an understatement. It is as critical as with all other use cases, when the signal actually comes from code that the Native SDK should handle.

Wouldn't it be better to just mask the incoming signal before we invoke the handler at start? This way, a raise inside the .NET handler is blocked until we sigreturn, and we retain the recursive reentrancy semantics of our handler (which are critical because we execute user-provided code from the handler). Disable the unmask once we know the .NET handler didn't feel responsible for the signal.

supervacuus · 2026-03-11T13:16:25Z

src/backends/sentry_backend_inproc.c

+    g_sigaction.sa_flags = SA_SIGINFO | SA_ONSTACK;
+    if (g_backend_config.handler_strategy
+        != SENTRY_HANDLER_STRATEGY_CHAIN_AT_START) {
+        g_sigaction.sa_flags |= SA_NODEFER;
+    }


Tbh, this feels quite like the hammer.

Could you test an approach like this inside the chain-at-start block of process_ucontext()

sigset_t mask, old_mask; sigemptyset(&mask); sigaddset(&mask, uctx->signum); sigprocmask(SIG_BLOCK, &mask, &old_mask); invoke_signal_handler(uctx->signum, uctx->siginfo, (void *)uctx->user_context); if (ip != get_instruction_pointer(uctx) || sp != get_stack_pointer(uctx)) { // No need to restore the signal mask here: sigreturn will // restore it from the saved ucontext. return; } // once we know we own the signal, unmask again sigprocmask(SIG_SETMASK, &old_mask, NULL);

Thanks for the suggestion. I tested it on an emulator and faced a couple of issues:

sigprocmask seems to be intercepted by libsigchain on Android. A raw syscall(SYS_rt_sigprocmask, SIG_BLOCK, ...) works, though.

unmasking delivers a pending SIGSEGV and kills the process

Mono restores SIG_DFL and re-raises SIGSEGV:

sigaction(SIGSEGV, &saved_default_handler, NULL); raise(SIGSEGV);

With the signal blocked, the raise() creates a pending signal instead of being delivered immediately. When unmasked, the pending SIGSEGV is delivered, but the handler is now SIG_DFL, so it terminates the process before sentry-native can capture the crash.

I guess leaving it unmasked would have more or less the same trade-off as the original SA_NODEFER removal approach, basically losing recursive crash detection. What if we would restore our handler, and consume the the pending signal with sigtimedwait?

sigset_t mask, old_mask; sigemptyset(&mask); sigaddset(&mask, uctx->signum); // bypass libsigchain on Android syscall(SYS_rt_sigprocmask, SIG_BLOCK, &mask, &old_mask, sizeof(sigset_t)); invoke_signal_handler(uctx->signum, uctx->siginfo, (void *)uctx->user_context); if (ip != get_instruction_pointer(uctx) || sp != get_stack_pointer(uctx)) { return; } // restore our handler struct sigaction current; sigaction(uctx->signum, NULL, &current); if (current.sa_handler == SIG_DFL) { sigaction(uctx->signum, &g_sigaction, NULL); } // consume pending signal struct timespec timeout = { 0, 0 }; sigtimedwait(&mask, NULL, &timeout); // unmask syscall(SYS_rt_sigprocmask, SIG_SETMASK, &old_mask, NULL, sizeof(sigset_t));

1. sigprocmask seems to be intercepted by libsigchain on Android. A raw syscall(SYS_rt_sigprocmask, SIG_BLOCK, ...) works, though.

What is the observed effect? If libsigchain is the callee of a sigprocmask() call and it happens from inside the signal handler, then it normally passes it on to the linked sigprocmask(), so yes... this will go through libsigchain, but what effect do you see?!

// restore our handler struct sigaction current; sigaction(uctx->signum, NULL, &current); if (current.sa_handler == SIG_DFL) { sigaction(uctx->signum, &g_sigaction, NULL); } // consume pending signal struct timespec timeout = { 0, 0 }; sigtimedwait(&mask, NULL, &timeout); // unmask syscall(SYS_rt_sigprocmask, SIG_SETMASK, &old_mask, NULL, sizeof(sigset_t));

Do we need to restore our handler? It should be enough to just sigwait()/sigtimedwait() (which are both not async-signal-safe), since they both just block the calling thread until the pending signal arrives (which in our case is already there). Or do you mean for the case of reentering?

What is the observed effect? If libsigchain is the callee of a sigprocmask() call and it happens from inside the signal handler, then it normally passes it on to the linked sigprocmask(), so yes... this will go through libsigchain, but what effect do you see?!

The intercepted sigprocmask call returns 0 indicating success, but the signal isn't blocked. Works fine with a raw syscall.

Do we need to restore our handler? It should be enough to just sigwait()/sigtimedwait() (which are both not async-signal-safe), since they both just block the calling thread until the pending signal arrives (which in our case is already there). Or do you mean for the case of reentering?

Yes, it's for trying to preserve the recursive crash detection with SA_NODEFER.

The intercepted sigprocmask call returns 0 indicating success, but the signal isn't blocked. Works fine with a raw syscall.

Could be this: https://cs.android.com/android/platform/superproject/main/+/main:art/sigchainlib/sigchain.cc;l=653-661

Yes, but we shouldn't even be reaching that point, because right before that, libsigchain checks whether we are in the signal handler (via TLS), and directly calls the "linked" sigprocmask.

The check for claimed signals is only relevant outside of signal handlers, because it could prevent special handlers installed by libsigchain from being blocked.

But sigprocmask from within the handler would only defer delivery until the current signal handler leaves. At the point where any of our signal handlers or the .NET signal handler is invoked, all special handlers have already been invoked.

This reverts commit 91afd1a.

…ng process With SA_NODEFER, the chained handler's re-raise is delivered immediately and kills the process before we regain control. Mask the signal via raw rt_sigprocmask (to bypass Android's libsigchain), then after the chain: reinstall our handler if it was reset to SIG_DFL, consume any pending signal with sigtimedwait, and unmask. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi · 2026-03-11T16:42:19Z

Yeah, this makes sense, but is there no Mono test in the downstream integration tests? It is quite painful that the two runtimes differ so severely, given that we seem to lack any early warning for that particular config. Or did this only happen with .NET 10?

This happens with both .NET 9 and .NET 10. Unfortunately, the PR that takes chained signal handling into use and would have revealed this problem, has been on hold until now due to other issues in .NET 10.

feat: native signal handler strategy on Android sentry-dotnet#4676

src/backends/sentry_backend_inproc.c

sigtimedwait is not declared without _POSIX_C_SOURCE >= 199309L, so use the raw SYS_rt_sigtimedwait syscall instead. Also replace sizeof(sigset_t) with _NSIG/8 since the kernel expects 8 bytes, not glibc's 128. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry · 2026-03-11T17:12:21Z

src/backends/sentry_backend_inproc.c

+        struct timespec timeout = { 0, 0 };
+        syscall(SYS_rt_sigtimedwait, &mask, NULL, &timeout, _NSIG / 8);


Bug: The function returns early if the chained handler modifies ip or sp, but it does so without unmasking the signal or consuming the pending signal raised by the handler.
_{Severity: HIGH}

Suggested Fix

Before the early return, add the necessary calls to consume the pending signal and restore the original signal mask. Specifically, call syscall(SYS_rt_sigtimedwait, ...) with a zero timeout to consume the signal, and then call syscall(SYS_rt_sigprocmask, SIG_SETMASK, &old_mask, ...) to unmask it.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/backends/sentry_backend_inproc.c#L1598-L1599 Potential issue: In the signal handler, a signal is masked before invoking a chained handler. If this chained handler re-raises the signal (e.g., via `raise(signum)`) and also modifies the instruction pointer (`ip`) or stack pointer (`sp`), the function will `return` early. This early return path skips the logic that consumes the now-pending signal with `sigtimedwait` and unmasks it with `sigprocmask`. This leaves the process with a pending signal that is still masked, preventing its delivery and potentially causing undefined behavior or crashes later on.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-11T17:21:22Z

src/backends/sentry_backend_inproc.c

+        sigemptyset(&mask);
+        sigaddset(&mask, uctx->signum);
+        // raw syscall to bypass libsigchain on Android
+        syscall(SYS_rt_sigprocmask, SIG_BLOCK, &mask, &old_mask, _NSIG / 8);


Raw syscall buffer overflow on 32-bit Android

High Severity

On 32-bit Android (LP32), bionic's sigset_t is only 4 bytes, but _NSIG / 8 evaluates to 8. The raw syscall(SYS_rt_sigprocmask, SIG_BLOCK, &mask, &old_mask, _NSIG / 8) tells the kernel to write 8 bytes into the 4-byte old_mask buffer, causing a stack buffer overflow. The project explicitly targets armeabi-v7a and x86 (32-bit ABIs). Bionic's libc wrapper normally handles this size mismatch via an internal kernel_sigset_t union, but the raw syscall bypasses that conversion.

Additional Locations (2)

src/backends/sentry_backend_inproc.c#L1598-L1599

src/backends/sentry_backend_inproc.c#L1601-L1602

jpnurmi and others added 2 commits March 11, 2026 13:27

Update CHANGELOG.md

ab26763

jpnurmi force-pushed the jpnurmi/fix/sa-nodefer-chain-at-start branch from 89200fa to ab26763 Compare March 11, 2026 12:40

jpnurmi requested a review from supervacuus March 11, 2026 12:52

supervacuus requested changes Mar 11, 2026

View reviewed changes

jpnurmi and others added 3 commits March 11, 2026 14:27

Revert "fix: skip SA_NODEFER when CHAIN_AT_START is active"

66b86a6

This reverts commit 91afd1a.

Update CHANGELOG.md

57a447a

jpnurmi changed the title ~~fix: skip SA_NODEFER when CHAIN_AT_START is active~~ fix: mask signal during CHAIN_AT_START to survive chained handler re-raises Mar 11, 2026

sentry bot reviewed Mar 11, 2026

View reviewed changes

src/backends/sentry_backend_inproc.c Show resolved Hide resolved

cursor bot reviewed Mar 11, 2026

View reviewed changes

src/backends/sentry_backend_inproc.c Outdated Show resolved Hide resolved

sentry bot reviewed Mar 11, 2026

View reviewed changes

cursor bot reviewed Mar 11, 2026

View reviewed changes

		struct timespec timeout = { 0, 0 };
		syscall(SYS_rt_sigtimedwait, &mask, NULL, &timeout, _NSIG / 8);

Uh oh!

Conversation

jpnurmi commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Uh oh!

supervacuus commented Mar 11, 2026

Uh oh!

supervacuus Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

jpnurmi Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpnurmi Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

supervacuus Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpnurmi Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

jpnurmi Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

supervacuus Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

jpnurmi commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

sentry bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Raw syscall buffer overflow on 32-bit Android

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jpnurmi commented Mar 11, 2026 •

edited

Loading

jpnurmi Mar 11, 2026 •

edited

Loading

supervacuus Mar 11, 2026 •

edited

Loading